Image processing apparatus and image processing method

ABSTRACT

An image processing apparatus is provided for realizing a higher speed print-out of a scan image. It is determined whether an input data type stored in metadata of document data is PDL or not (S 1609 ). If the input data type is “full-page image”, page data is divided into blocks and a thread is allotted to each of the blocks (S 1608 ). If the input data type is not “full-page image”, the process goes to S 1603 . Subsequently, DL data is generated from vector data in the document, the DL data is added to the document, and the DL data is rendered into a bit map (S 1603  to S 1605 ). If the threads are processed by a plurality of processors, respectively, it becomes possible to carry out the processing in parallel and thereby to realize higher speed processing, when the input data type is “full page image”.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates to an image processing apparatus or thelike which forms and outputs an image of scan data and page descriptiondata.

2. Description of the Related Art

Japanese Patent Laid-Open No. 2006-23942 discloses a techniqueconverting a bit map image which is input with an image input device,into drawing data which does not depend on a resolution of the imageinput device. Such processing converting bit map image intoresolution-independent data is called vector conversion orvectorization, and the data obtained as a result of the vectorconversion is called vector data. By utilizing this technique, it ispossible to convert the input image into the resolution-independentvector data, and to reuse the image in an optional size withoutdegrading the image.

Further, Japanese Patent Laid-Open No. H11-170657 (1999) discloses atechnique which performs drawing processing in parallel by distributingPDL (Page Description Language) data to a plurality of drawingprocessors.

Moreover, Japanese Patent Laid-Open No. 2007-108861 discloses atechnique which switches a CPU or a frequency to be used, between PDLdata and raster data.

Here, the vector data generated from a scan image has a characteristicthat this vector data is simple drawing data but has a lot of redundancycompared to the vector data generated from the PDL data, since theoriginal of the former vector data is image data.

However, when the vector data generated from the scan image is printedout, drawing development processing thereof is performed as same as forthe PDL data regardless of the original data, and therefore, there hasbeen a problem that the drawing development processing efficiency is lowin the vector data generated from the scan image.

Further, for distributing the PDL data to the plurality of drawingprocessors, it is necessary to analyze page data again after scanningthe page data and performing processing to divide the PDL data, andtherefore, there has been a problem that an excess time is required fordividing the PDL data.

Still further, for displaying the vector data on a PC, it is necessaryto convert the vector data into a universal format such as PDF (PortableDocument Format). However, there has been a problem that the universalformat data, which has been converted originally from the PDL data,sometimes becomes considerably redundant after the conversion because ofa difference in drawing models.

Moreover, when search is made for the stored vector data, in which datagenerated from the PDL data is mixed, there has been a problem that thesearch with a character string cannot be made as desired.

The technique disclosed in Japanese Patent Laid-Open No. 2007-108861does not provide any method for solving the problem for such variousfunctions other than PDL data analysis and rendering.

SUMMARY OF THE INVENTION

An object of the present invention is to provide a higher speedprint-out for the scan image. Further, another object of the presentinvention is to provide an improved efficiency of conversion from thePDL data to the universal format. Moreover, still another object of thepresent invention is to provide an improved convenience in documentsearch.

For achieving the above objects, an image processing apparatus of thepresent invention includes: a reception component receiving PDL datafrom an external apparatus; an input component inputting scan image datafrom a scanner; a first conversion component converting at least a partof an area in the data input by the reception component and the inputcomponent into resolution-independent data which does not depend on aresolution of the input component; a second conversion componentconverting subsidiary information obtained when generating theresolution-independent data into additional information data asadditional information which is not subjected to print processing; acomponent retaining the resolution-independent data and the additionalinformation data in association with each other; a component retaininginformation indicating which of the image data and the PDL data is atype of the data input into the first conversion component; and acomponent switching data conversion processing depending on the type ofthe data input into the first conversion component.

Further, an image processing method of the present invention includesthe steps of: a reception component receiving PDL data from an externalapparatus; an input component inputting scan image data from a scanner;converting at least a part of an area of the data input by the receptioncomponent and the input component into resolution-independent data whichdoes not depend on a resolution of the input component; convertingsubsidiary information obtained when generating theresolution-independent data into additional information data asadditional information which is not subjected to print processing;retaining the resolution-independent data and the additional informationdata in association with each other; retaining information indicatingwhich of the image data and the PDL data is a type of the data input bythe reception component and the input component; and switching dataconversion processing depending on the data type.

Still further, a computer-readable recording medium of the presentinvention records a program for causing a computer to execute a methodcomprising the steps of: a reception component receiving PDL data froman external apparatus; an input component inputting scan image data froma scanner; converting at least a part of an area of the data input bythe reception component and the input component intoresolution-independent data which does not depend on a resolution of theinput component; converting subsidiary information obtained whengenerating the resolution-independent data into additional informationdata as additional information which is not subjected to printprocessing; retaining the resolution-independent data and the additionalinformation data in association with each other; retaining informationindicating which of the image data and the PDL data is a type of thedata input by the reception component and the input component; andswitching data conversion processing depending on the data type.

Moreover, a program of the present invention causes a computer toexecute a method including the steps of: a reception component receivingPDL data from an external apparatus; an input component inputting scanimage data from a scanner; converting at least a part of an area of thedata input by the reception component and the input component intoresolution-independent data which does not depend on a resolution of theinput component; converting subsidiary information obtained whengenerating the resolution-independent data into additional informationdata as additional information which is not subjected to printprocessing; retaining the resolution-independent data and the additionalinformation data in association with each other; retaining informationindicating which of the image data and the PDL data is a type of thedata input by the reception component and the input component; andswitching data conversion processing depending on the data type.

According to the present invention, it becomes possible to realize ahigh speed print-out by switching processing depending on an originaldata type of the generated vector data.

Further, it becomes possible to realize more efficient conversion fromthe vector data to the universal format similarly by switching theprocessing depending on the original data type.

Moreover, it becomes possible to improve convenience in document searchsimilarly by switching the processing depending on the original datatype.

Further features of the present invention will become apparent from thefollowing description of exemplary embodiments (with reference to theattached drawings).

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a side cross-sectional view showing a structure of a printingmachine (MFP) according to an embodiment of the present invention;

FIG. 2 is a block diagram showing a configuration example of a controlunit in each equipment of an embodiment;

FIG. 3 is a block diagram showing an example of a controller softwareconfiguration in an embodiment;

FIG. 4 is a diagram showing a data flow in a control unit in anembodiment;

FIG. 5 is a diagram showing a data flow in a control unit in anembodiment;

FIG. 6 is a diagram showing a data flow in a control unit in anembodiment;

FIG. 7 is a diagram showing an example of a case to perform areadividing on input image in an embodiment;

FIG. 8 is a flowchart showing document generation processing in anembodiment;

FIG. 9 is a diagram showing document generation and print processingfrom PDL data;

FIG. 10 is a flowchart showing retaining processing of an input datatype in an embodiment;

FIG. 11 is a diagram showing a data structure of metadata;

FIG. 12 is a diagram showing a data structure of metadata;

FIG. 13 is a diagram showing an example of print-out data to be used inan embodiment;

FIG. 14 is a diagram showing an example of print-out data to be used inan embodiment;

FIG. 15 is a diagram showing an example of print-out data to be used inan embodiment;

FIG. 16 is a flowchart showing an example of document print processingin an embodiment;

FIG. 17A is a diagram showing an example of gradation drawing;

FIG. 17B is a diagram showing an example of gradation drawing;

FIG. 18A is a diagram showing an example of step-like figure drawing;

FIG. 18B is a diagram showing an example of step-like figure drawing;

FIG. 19 is a flowchart showing switching processing in optimizationprocessing in an embodiment;

FIG. 20 is a diagram showing features of character strings generatedfrom scan image and character strings generated from PDL data in anembodiment;

FIG. 21A is a diagram showing features of character strings generatedfrom scan image and character strings generated from PDL data in anembodiment;

FIG. 21B is a diagram showing features of character strings generatedfrom scan image and character strings generated from PDL data in anembodiment;

FIG. 22A is a diagram showing features of character strings generatedfrom scan image and character strings generated from PDL data in anembodiment;

FIG. 22B is a diagram showing features of character strings generatedfrom scan image and character strings generated from PDL data in anembodiment;

FIG. 23 is a diagram showing an example of a screen displayed on anoperation section in an embodiment;

FIG. 24 is a diagram showing determination processing in searchprocessing of an embodiment;

FIG. 25 is a diagram showing determination processing in universalformat conversion of an embodiment;

FIG. 26A is a diagram showing a difference in drawing expression betweenPDL and a universal format;

FIG. 26B is a diagram showing a difference in drawing expression betweenPDL and a universal format; and

FIG. 26C is a diagram showing a difference in drawing expression betweenPDL and a universal format.

DESCRIPTION OF THE EMBODIMENTS

Hereinafter, the best mode for implementing the present invention willbe described with reference to the drawings. Note that a constituentdescribed in this embodiment is only an exemplification and is notintended to limit the scope of this invention thereto.

[Embodiment 1]

<Configuration of an Image Processing Apparatus>

There will be described a configuration of a 1D color MFP (MultiFunction Peripheral), to which the present example is preferablyapplied, with reference to FIG. 1.

The 1D color MFP includes a scanner, a laser exposure section, aphoto-sensitive drum, an imaging section, a fixing section, a paperfeed/transfer section, and a printer controller (not shown in thedrawing) which controls these sections.

The scanner reads a document image of a document placed on a platenoptically by illuminating the document image, and converts the imageinto an electrical signal to generate image data.

The laser exposure section inputs a light beam, such as a laser beam,modulated according to the image data onto a rotating polygon mirrorwhich rotates in a constant angular speed, and irradiates thephoto-sensitive drum with a reflected scan beam thereof.

The imaging section drives and rotates the photo-sensitive drum, chargesit with a charger, develops a latent image formed on the photo-sensitivedrum by the laser exposure section with a toner, and transfers the tonerimage onto a sheet. Then, the imaging section performs a series ofelectrophotographic processes for imaging, such as a process to collectminute toners remaining on the photo-sensitive drum without beingtransferred. At this time, while the sheet is winded on a predeterminedposition of a transfer belt and rotates four times, each of developmentunits (development stations), which have the toners of magenta (M), cyan(C), yellow (Y), and black (K), respectively, performs the aboveelectrophotographic process repeatedly and sequentially in turn. Afterthe four time rotations, the sheet, to which the full-color four-tonnerimage is transferred, is removed from a transfer drum and transferred tothe fixing section.

The fixing section is configured with a combination of a roller and abelt and contains a heat source such as a halogen heater, and makes thetoners on the sheet, to which the toner image is transferred by theimaging section, to fuse by heat and pressure and fixes the toner image.

The paper feed/transfer section has one or more sheet storagerepresented by a sheet cassette or a paper deck, separates one sheetfrom a plurality of sheets stored in the sheet storage according to aninstruction from the printer controller, and transfers the sheet to theimaging section and then to the fixing section. The sheet is winded tothe transfer drum in the imaging section and transferred to the fixingsection after having rotated four times. During the four time rotations,each of the above YMCK toner images is transferred onto the sheet.Further, when images are to be formed on both sides of the sheet, thesheet having passed through the fixing section is controlled to betransferred along a transfer path which transfers the sheet to theimaging section again.

The printer controller communicates with an MFP controller whichcontrols the entire MFP, carries out control according to an instructionthereof, and also, while managing a state in each of the above sectionsof scanner, laser exposure, imaging, fixing, and paper feed/transfer,provides instructions so as to make these entire sections preserveharmony and operate smoothly.

<Configuration of the Controller Unit>

FIG. 2 is a block diagram showing a configuration example of the MFPcontroller unit (controller) in the present embodiment. In FIG. 2, acontrol unit 200 is connected with a scanner 201 of an image inputdevice and a printer engine 202 of an image output device, and carriesout controls for image data read-out and print-out. Further, the controlunit 200 is connected to a LAN 10 or a public line 204 and therebyperforms control for input and output of image information or deviceinformation via the LAN 10.

A CPU 205 is a central processing unit for controlling the entire MFP. ARAM 206 is a system work memory for the operation of the CPU 205 andalso an image memory for storing input image data temporarily. Further,a ROM 207 is a boot ROM and stores a boot program of the system. An HDD208 is a hard-disk drive and stores system software for various kinds ofprocessing, the input image data, etc. An operation section I/F 209 isan interface part for an operation section 210 which has a displayscreen capable of displaying image data or the like, and outputsoperation screen data to the operation section 210. Further, theoperation section I/F 209 performs a function to transfer information,which is input by an operator from the operation section 210, to the CPU205. A network interface 211 is realized by a LAN card or the like, forexample, and is connected to the LAN 10 to input or output informationfrom or to an external apparatus. Furthermore, a modem 212 is connectedto the public line 204 and inputs or outputs information from or to anexternal apparatus. The above units are arranged on a system bus 213.

An image bus I/F 214 is an interface for connecting the system bus 213and an image bus 215 which transfers the image data in a high speed, andis a bus bridge converting a data structure. To the image bus 215, thereare connected a raster image processor (RIP) 216, a device I/F 217, ascanner image processing section 218, a printer image processing section219, an image-edition image processing section 220, and a colormanagement module (CMM) 230.

The raster image processor (RIP) 216 develops a page descriptionlanguage (PDL) code or vector data to be described below, into an image.The device I/F 217 connects the scanner 201 and the printer engine 202to the control unit 200, and carries outsynchronization/non-synchronization conversion of the image data.

Further, the scanner image processing section 218 performs various kindsof processing such as correction, modification, and edition, on scanimage data input from the scanner 201. The printer image processingsection 219 performs processing such as correction and resolutionconversion according to a printer engine 202, on the image data to beprinted out. The image-edition image processing section 220 performsvarious kinds of image processing such as image data rotation, imagedata compression/decompression processing. The color management module(CMM) 230 is a dedicated hardware module performing color conversionprocessing (also called color space conversion processing) based on aprofile or calibration data on the image data. The profile isinformation such as a function to convert color image data representedby a color space which depends on a device into a color space (e.g., Labor the like) which does not depend on the device. The calibration datais data for modifying a color reproduction characteristic of the scanner201 or the printer engine 202 in the MFP.

<Configuration of Controller Software>

FIG. 3 is a block diagram showing a configuration of controller softwarewhich controls the operation of the MFP.

A printer interface 300 is a unit for input and output from and to theoutside. A protocol controller 302 is a unit to perform communicationwith the outside by analyzing and transmitting a network protocol.

A vector data generation section (first conversion unit) 304 generatesvector data, which is a drawing description without depending on aresolution, from a bit map image (vectorization).

A metadata generation section (second conversion unit) 306 generatessubsidiary information, which is obtained in the process of thevectorization, as metadata. The metadata is additional information datafor searching and is not necessary for drawing processing.

A PDL analysis section 308 is a unit to analyze the PDL data and toconvert the PDL into an intermediate code (DisplayList, called “DL”hereinafter) which is a format to be processed more easily. Theintermediate code data generated in the PDL analysis section 308 istransferred to a data drawing section 310 to be processed. The datadrawing section 310 develops the above intermediate code data into bitmap data, and the developed bit map data is drawn in a page memory 312sequentially.

The page memory 312 is a volatile memory retaining the bit map datadeveloped by a renderer temporarily.

A panel input/output controller controls input and output of theoperation panel.

A document storage section 316 is a unit storing a data file includingthe vector data, DL data, and metadata for each block (job) unit of aninput document, and is realized by a secondary storage unit such as ahard-disk. Note that the present example calls this data file“document”.

A scan controller 318 performs various kinds of processing such ascorrection, modification, and edition, on the image data input from thescanner.

A printer controller 320 converts contents of the page memory 312 into avideo signal, and transfers an image to the printer engine section 322.The printer engine section 322 is a print mechanical section for forminga permanent visible image of the received video signal on a recordingpaper.

<Data Processing of the Controller Unit>

Next, there will be described how the vector data, DL data, andmetadata, which compose the document, are generated.

FIG. 4, FIG. 5, and FIG. 6 show a data flow of the control unit 200 inthe present embodiment.

FIG. 4 is a data flow in copy operation.

First, scan processing d1 converts a paper document set in a documentexposure section into bit map data. Subsequently, vectorizationprocessing d2 and metadata generation processing d4 generate the vectordata and the accompanying metadata, each of which does not depend on aresolution, from the bit map data, respectively. Specific generationmethods of the vector data and the metadata will be describedhereinafter.

Next, document generation processing d3 generates the document of thevector data and the metadata which are associated with each other.Subsequently, DL generation processing d5 generates the DL data from thevector data in the document, and stores the generated DL data into thedocument and also sends the DL data to rendering processing d7 todevelop the DL data into the bit map data.

Print processing d8 records the developed bit map onto a paper medium toform a printed matter. Note that, by setting the output printed matterin the document exposure section again, it is possible to carry out theprocessing from the scan processing d1.

FIG. 5 shows a specific data flow of the metadata generation processingd4 shown in FIG. 4. First, area dividing processing d1 carries out areadividing of the bit map.

The area dividing is processing to analyze the input bit map image dataand to divide the image data into areas by a block of an object includedin the image, and then to classify the areas by determining an attributeof each of the areas. The attributes include TEXT, PHOTO, LINE, PICTURE,TABLE, etc.

Here, FIG. 7 shows an example of a case to perform area dividing on theinput image. A determination result 72 is a result of performing thearea dividing on the input image 71. In the determination result 72, anarea enclosed by a dotted line represents one object unit resulted fromthe image analysis, and the kind of the attribute attached to each ofthe objects is a determination result of the area dividing.

Among areas classified into the attributes, an area of the attributeTEXT is subjected to character recognition processing by OCR processingd2 and converted into a character string. That is, this character stringis one printed on a paper. On the other hand, among the areas classifiedinto the attributes, the area of the attribute PHOTO is converted intoimage information through image information extraction processing d3.The image information is a character string representing a feature ofthe image such as a character string of “flower” and “face”, forexample. For extracting the image information, it is possible to use ageneral image processing technique such as an image feature quantity(pixel frequency or density composing the image) detection or facerecognition.

The generated character string and image information are arranged into adata format to be described below by format conversion processing d4,and the metadata is generated.

FIG. 6 shows a data flow of PDL print. The PDL print is a printeroperation of receiving and outputting the PDL data generated by aprinter driver on a PC (Personal Computer), when application software onthe PC instructs printing.

First, the received PDL data is analyzed by PDL data analysis processingd1 and the vector data is generated.

Next, DL data generation processing d2 generates the DL data from thevector data and the generated DL data is stored into the document andalso sent to rendering processing d3 to be developed into the bit map.The developed bit map is recorded on a paper medium by print processingd4 to form a printed matter. The vector data and DL data generated inthis process are stored into the document by document generationprocessing d6.

Further, from the bit map generated by the rendering processing d3,metadata generation processing d5 generates the character string andimage information as the metadata as same as in the copy operation, andstores the metadata into the document.

Meanwhile, the PDL has a various kinds such as LIPS (LBP ImageProcessing System) and PS (PostScript), and some of them includecharacter string information. In this case, the metadata is generatedfrom the character string in the PDL analysis and is stored into thedocument.

Next, document generation processing and print processing will bedescribed using flowcharts.

FIG. 8 shows the document generation processing. This processing is oneto receive the bit map data and to generate the document composed of thevector data, DL data, and metadata.

First, Step S801 performs the above described area dividing processing.Subsequently, Step S802 classifies types of areas (area attributes) intoTEXT, GRAPHIC, and IMAGE and performs different processing for each ofthe types. FIG. 7 shows an example of classifying the area attributesinto TEXT, PHOTO, LINE, PICTURE, and TABLE, but the area attributes ofPHOTO and PICTURE and the area attributes of LINE and TABLE in FIG. 7are classified into IMAGE and GRAPHIC, respectively.

If the area attribute is TEXT, the process goes to Step S803 and the OCRprocessing is performed, and then the Step S804 extracts the characterstring. After that, Step S805 converts the character string into themetadata and the process goes to Step S806 which converts a recognizedcharacter outline into the vector data (vectorization).

Here, a little more description will be added.

The metadata generated from the character string is a sequence ofcharacter codes, but the sequence of the character codes is necessaryinformation for keyword search.

However, the OCR processing can recognize the character code but cannotrecognize a font such as “Mincho” and “Gothic”, a character size such as“10 pt” and “12 pt”, or character decoration such as “italic” and“bold”. Accordingly, it is necessary for drawing to retain the characteroutline as the vector data instead of the character code.

On the other hand, if the area attribute is IMAGE in Step S802, theprocess goes to Step S807 and the image information extractionprocessing is performed.

The Step S807 detects the image feature using the general imageprocessing technique such as the image feature quantity detection or theface recognition as described above. Subsequently, the process goes toStep S808 and the detected image feature is converted into the characterstring. This conversion is easy to perform by retaining a table of afeature parameter and the character string.

After that, Step S809 converts the character string into the metadata.

For the area attribute of IMAGE, the vectorization is not performed andthe image data is retained as is in the vector data.

If the area attribute is GRAPHIC in Step S802, the process goes to StepS810 and the vectorization processing is performed.

Step S811 converts the metadata or the vector data into the documentformat.

FIG. 9 shows the document generation and print processing from the PDLdata. This processing receives the PDL data and generates the documentto output a print.

First, Step S901 analyzes the PDL data. If the metadata such as thecharacter string information is found to be included in the PDL dataduring the analysis, the process goes to Step S909 and the informationof the PDL is added to the metadata.

On the other hand, in Step S902, if the PDL data includes data otherthan the metadata such as the character string information, the processgoes to Step S903 and the data is converted into the vector data. Then,the process goes to Step S904 and the document is generated.

Next, Step S905 generates the DL data and the process goes to Step S906which adds the generated DL data to the document.

The above flow generates the document, and the whole processing iscompleted after the subsequent rendering processing in Step S907 andprint processing to a paper medium in Step S908.

<Document Data Structure>

Next, a structure of the document will be described.

FIG. 11 shows the data structure of the document.

The document is data composed of a plurality of pages and includes dataclassified roughly into the vector data (a), the metadata (b), and theDL data (c), and has a hierarchical structure headed by a documentheader (x1). In detail, the vector data (a) includes a page header (x2),summary information (x3), and an object (x4), and the metadata (b)includes page information (x5) and detailed information (x6). Further,the DL data (c) includes a page header (x7) and an instruction fordrawing development (x8). The document header (x1) describes a storingposition of the vector data and a storing position of the DL data, andthereby the vector data and the DL data are associated with each otherby the document header (x1).

The vector data (a) is drawing data which does not depend on aresolution (resolution-independent data) and the page header (x2)describes layout information such as a size and a direction of a pageand the like. The object (x4) is linked to each of drawing data setssuch as a line, a polygon, a Bezier curve, etc., and a plurality ofobjects is associated collectively with the summary information (x3).The summary information (x3) represents the features of the plurality ofobjects collectively, and describes the attribute information of thedivided area explained in FIG. 7 and the like.

The metadata (b) is the additional information for searching, which isnot related with the drawing processing. The page information (x5) areadescribes the page information such as one whether the metadata isgenerated from the bit map data or from the PDL data, for example, andthe detailed information (x6) describes the character string (charactercode string) generated as the OCR information or the image information.

Further, the summary information (x3) of the vector data (a) refers tothe metadata, and the detailed information (x6) can be found from thesummary information (x3).

The DL data (c) is intermediate code data for the bit map development bythe renderer. The page header (x7) describes a management table ofdrawing information (instruction) within a page and the like, and theinstruction (x8) is composed of the drawing information which depends ona resolution.

<Retaining of the Input Data Type>

Next, retaining processing of the input data type will be described.

The flowchart in FIG. 10 shows the retaining processing of the inputdata type.

This processing is performed in the generation of each documentexplained in FIG. 4 and FIG. 6.

First, Step S1001 acquires the metadata of the generated document.Subsequently, Step S1002 determines whether this document has beengenerated from the bit map image. When this flow starts directly afterthe data flow shown in FIG. 5, this document has been generated from thebit map image.

If, in Step S1002, the document is determined to have been generatedfrom the bit map image, the process goes to Step S1003, which sets theinput data type: “full-page image” in the page information of themetadata acquired in Step S1001. On the other hand, if the document isdetermined to have been generated from the PDL data in Step S1002, theprocess goes to Step S1004, which sets the input data type to be “PDL”.Note that, even when the input data type is already set to be “PDL” inStep S1003, this “PDL” is overwritten with “full-page image” for thesetting. Accordingly, the input data type of an image generated from thePDL data by the rendering is changed to “full-page image” by theprocessing in this step.

Here, detail of the input data type will be described using FIG. 12.

FIG. 12 shows a data structure of the metadata shown in FIG. 11.

In FIG. 12, Symbols mp1 and mp2 indicate the metadata of the first pageand the second page, respectively. The metadata indicated by Symbol mp1is composed of a page ID md1, the input data type md2, and detailed metainformation a2 which is obtained by inside analysis of the page. Themetadata indicated by Symbol mp2 also has the similar structure, and allthe pages have the same structure. In this manner, the input data typeis the metadata retained by one for each of the pages.

<Switching of Processing by the Input Data Type>

Next, features of the scan image and the PDL data will be describedusing FIG. 13, FIG. 14, and FIG. 15. FIG. 13 shows an example ofprint-out data to be used for the description here.

A black rectangle o1 is overlapped with a round figure o2 thereon, andin a part c1, where o1 and o2 overlap with each other, ground black isseen transparently.

FIG. 14 shows an example for a case of inputting this picture from thescan image, and FIG. 15 shows an example for a case of inputting thepicture from the PDL data.

The picture input from the scan image as an image is subjected to thevectorization processing as described above, and is divided first intothree areas s1, s2, and s3 as explained in FIG. 5, by separating theoverlapped area into a different area for the area dividing as shown inFIG. 14. While, usually, each of the comparatively simple areas s1 ands3 is converted into the vector data composed of a dot string and thecomparatively complicated area s2 is converted into the bit map, thisdividing provides an important feature that the areas s1, s2, and s3 aredivided by the area dividing and thereby converted into the drawing datasets which do not overlap with one another.

On the other hand, the picture input as the PDL data is represented bytwo figures of a black rectangle p1 and a round figure p2 as shown inFIG. 15, and the overlap is realized by specifying that the round figurep2 has a transmittance of 50%. That is, the PDL data is not divided intoareas and is a higher-level abstract representation.

Further, the areas s1, s2, and s3 area-divided in FIG. 14 and thefigures p1 and p2 of FIG. 15 are converted into block structures in thedocument data structure as a1 and a2 shown in FIG. 11.

FIG. 16 shows document print processing. This processing prints out thegenerated document.

First, Step S1601 receives the document data, and Step S1602 acquiresthe metadata in the document data. Subsequently, Step S1609 determineswhether the input data type stored in the metadata is “full-page image”or not. If the input data type is “full-page image”, the process dividesthe page data into blocks and allots a thread to each of the blocks(each part of the page) in S1608, and then goes to S1603. On the otherhand, the input data type is determined not to be “full-page image”,that is, to be “PDL” in Step S1609, the process goes directly to StepS1603 and the process is continued.

Step S1603 generates the DL data from the vector data in the document.Subsequently, Step S1604 adds the generated DL data to the document, andStep S1605 renders the DL data into the bit map. Finally, Step S1606performs print processing onto a paper medium and the process iscompleted.

That is, by processing the threads with a plurality of processors in theCPU 205, respectively, it is possible to perform the parallel processingfor the case of the input data type “full-page image” and thereby torealize high speed processing.

[Embodiment 2]

While Embodiment 1 has realized the parallel processing by utilizing theinput data type, this embodiment intends to realize higher-speedprocessing by switching the processing so as not to perform unnecessaryprocessing utilizing the input data type. First, an example ofprocessing to optimize the vector data will be described using FIG. 17A,FIG. 17B, FIG. 18A, and FIG. 18B.

FIG. 17A and FIG. 17B show examples of gradation drawing.

FIG. 17A represents gradation by connecting parallelograms havingslightly different colors there among, and each of the parallelograms isrepresented by four points (x1, y1), (x2, y2), (x3, y3), and (x4, y4).In such data, the number of the parallelograms increases as thegradation becomes smoother, and the dot string data becomes huge.Accordingly, optimization processing is usually performed to puttogether the parallelograms into one figure as shown in FIG. 17B bychecking a state of the connection and a rate of color change among theparallelograms.

FIG. 18A and FIG. 18B show examples of step-like figure drawing.

In FIG. 18A, a step-like diamond shape is represented by connectingrectangles having the same height (usually, one pixel) and differentwidths, and one rectangle is represented by 4 points (x1, y1), (x2, y2),(x3, y3), and (x4, y4). Such data has also the huge dot string, but, ifthe height is the same, the data can be represented only by differencesin the x direction from (x1, y1) to (x2, y2) as shown in FIG. 18B.

The optimization processing such as one shown in FIG. 17A, FIG. 17B,FIG. 18A, or FIG. 18B can optimize the vector data to a large extent buttakes a long time.

Here, such representation is performed usually on the PDL data which isan output from an application.

Accordingly, it is possible to realize higher speed processing byutilizing the input data type and omitting this optimization processingfor the scan image.

FIG. 19 shows a flowchart of switching processing in the optimizationprocessing.

First, Step S1901 acquires the metadata in the document.

Next, the process detects the input data type stored in the metadataacquired in Step S1901. If the input data type is “PDL”, the processgoes to Step S1903 and performs the above described optimizationprocessing. On the other hand, if the input data type is determined tobe “full-page image” in Step S1902, the optimization processing is notperformed and the process is terminated without any other processing.

[Embodiment 3]

This embodiment improves convenience of document search by utilizing theinput data type.

First, features of the character string generated from the scan imageand the character string generated from the PDL data will be describedusing FIG. 20, FIG. 21A, FIG. 21B, FIG. 22A, and FIG. 22B.

FIG. 20 shows an example of print-out data to be used for thedescription here.

A picture of FIG. 20 includes standard type characters st2 and st3 anditalic-modified type characters st1 and st4.

FIG. 21A and FIG. 21B show examples of cases inputting this picture fromthe PDL data.

The standard type part of the character string input as the PDL data isinput as the character code as shown in FIG. 21B. On the other hand, theitalic type part is input as a download character (bit map character) asshown in FIG. 21A. It generally depends on an application for documentgeneration whether such a decorated character is represented by thedecoration-modified bit map character or by an italic modificationinstruction and the character code. The character part “PDL” input asthe download characters is not represented by the character codes andthereby cannot be recognized as the character string.

On the other hand, when the picture is input from the scan image, theinside of the page is uniformly subjected to the OCR processing and allthe characters can be recognized as the character string.

FIG. 22A and FIG. 22B show the document data when the picture of FIG. 20is input by each of the PDL data and the scan image.

FIG. 22A shows the document data for the case of input by the PDL data,and the document data is divided into the download character “PDL” (t1)and the text character “data has various” (t2) to be stored. Themetadata obtained here is only the character string obtained from thecharacter code of t2. Therefore, the metadata mt stores only “data”“has”, and “various” and the input data type of the page information mpbecomes “PDL”.

FIG. 22B shows the document data for the case of input from the scanimage. The scan image is subjected to the vectorization processing asdescribed above and thereby a character shape is converted into a vectordot string (t1) and the character string is extracted by the OCRprocessing at the same time.

Therefore, the metadata mt stores “PDL”, “data”, “has”, and “various”,and the input data type of the page information mp becomes “full-pageimage”.

In this manner, the character string stored in the metadata has afeature that some information is lost in the generation from the PDLdata compared to the generation from the scan image.

Accordingly, this method may invites confusion of an operator in asearch function to pick up a document including any optional characterstring, by carrying out full-text search of the document. This isbecause, when the document is searched for the character string “PDL”,the document shown in FIG. 22B (input from the scan image) is hit butthe document shown in FIG. 22A (input by the PDL) is not hit,notwithstanding the same print-out result (FIG. 20).

FIG. 23 shows an example of a screen displayed on the display part ofthe operation section in the image processing apparatus, when theoperator instructs search of a document stored in a box. Symbol u1 inFIG. 23 indicates a display area for displaying a list of files in thebox.

Symbol u2 indicates a composite button. When the operator selects aplurality of files on the screen, push-down of the composite button u2connects the selected files to generate one file.

Symbol u3 indicates a search button. When the operator desires to searchthe file, and pushes down the search button and inputs any characterstring (not shown in the drawing), the file in the box is searched and alist of the result is displayed in the display area u1.

FIG. 24 shows a flowchart of search processing switching in the presentembodiment.

This processing switches whether the generated document is to besearched or not.

First, Step S2401 acquires the metadata in the document data.Subsequently, Step S2402 determines whether the input data type storedin the metadata is “PDL” or not. If the input data type is “PDL”, theprocess is terminated without any processing. On the other hand, if theinput data type is determined not to be “PDL”, that is, to be “full-pageimage” in Step S2402, the process goes to Step S2403, which continuesthe search processing of the metadata in the page.

That is, it is possible to avoid the confusion of the operator bylimiting the search object to a document which has the input data typeof scam image.

[Embodiment 4]

This embodiment realizes more efficient conversion processing to theuniversal format by utilizing the input data type.

The structure of the document generated from the scan image and PDL datais suitable for the print or the search. However, for displaying(previewing) the document on a client PC using an application, it isnecessary to convert the document into a universal format such as thePDF (Portable Document Format) format. However, it is not easy toconvert the document generated from the PDL data into the universalformat.

For example, the PDL data, such as LIPS (LBP Image Processing System)data, is subjected to clipping (shape cutout) processing usually usinglogical computation called ROP (Raster Operation). However, a drawingmodel (drawing representation) in the PDF of the universal format doesnot have the ROP. Accordingly, it is necessary to replace the ROPprocessing by another drawing expression such as a clip dot string, butthe drawing representation generated by the replacement becomes veryredundant.

The above clipping processing will be described for a specific caseusing FIG. 26A and FIG. 26B. FIG. 26A shows an example of print-out datato be used for the description, and shows a drawing of a clipped image.FIG. 26B shows an example of the drawing representation by the PDL fordrawing the data of FIG. 26A. First, an image is drawn in XOR and then ablack clipping image is drawn in AND. Finally, the same image as theformer image is drawn in XOR and thereby the result of FIG. 26A isobtained. FIG. 26C shows an example of the drawing representation by theuniversal format for drawing the data of FIG. 26A. By setting a dotstring having a cutout shape for a clip area and drawing an image there,the result shown in FIG. 26A is obtained.

FIG. 25 shows a flowchart of switching in the universal formatconversion processing (data conversion processing) according to thepresent example.

This processing switches execution and non-execution of the renderingprocessing in the conversion of the generated document into theuniversal format.

First, Step S2501 acquires the metadata in the document data.Subsequently, Step S2502 determines whether the input data type storedin the metadata is “PDL” or not. If the input data type is not “PDL”,the process goes directly to Step S2505, which performs the conversionprocessing to the universal format.

On the other hand, if the input data type is determined to be “PDL”, theprocess goes to Step S2503, which performs the rendering processing (bitmap development) on the corresponding page. That is, Step S2503 changesthe input data type (PDL) to the image data. The PDL data subjected tothe rendering processing becomes the full-page image data as same as thescan image, and is subjected to scan data processing in Step S2504. Thescan data processing is in essence the same as the processing for thescan image described above. After that, the process goes to Step S2505,which carries out the conversion to the universal format. Thisconversion is the same processing as that for the case in which theprocess goes from Step S2502 directly to S2505 (i.e., case in which theinput data type is the scan image).

That is, in the present example, the PDL image is once subjected to therendering processing in the conversion into the universal format, andthereby can be subjected to the same conversion processing as that forthe scan image, which avoids the universal format thereof from beingconverted into redundant drawing representation.

Note that conversion processing into the universal format performed inStep S2505 just converts the vector data of the document into thedrawing representation of the universal format on one-to-onecorrespondence, and description thereof will be omitted.

[Other Embodiments]

While various embodiments have been described in detail hereinabove, thepresent invention may be applied to a system configured with a pluralityof devices and also applied to an apparatus configured with a singledevice, such as a scanner, a printer, a PC, a copy machine, a compositemachine and a facsimile machine, for example.

The present invention is achieved also by a method to supply a softwareprogram realizing each of the functions in the foregoing embodiments,directly or remotely to a system or an apparatus, and to cause acomputer included in the system or the like to read and execute thesupplied program code.

Accordingly, the program code itself, which is installed into thecomputer for causing the computer to realize the functions andprocessing of the present invention, realizes the present invention.That is, the computer program itself, for realizing the above functionsand processing, falls within the scope of the present invention.

In this case, the computer program may be of any program types such asan object code, a program executed by an interpreter, script datasupplied to an OS, etc, if a program function is included therein.

Computer-readable recording media for supplying the program include, forexample, a flexible disk, a hard disk, an optical disk, amagneto-optical disk, an MO, a CD-ROM, a CD-R, a CD-RW, etc. Inaddition, the computer-readable recording media also include a magnetictape, a non-volatile memory card, a ROM, a DVD (DVD-ROM and DVD-R), etc.

Further, the program may be downloaded from the Internet or intranetwebsite using a browser of a client computer. That is, the computerprogram itself of the present invention, or a file including thecompressed computer program with an auto-install function maybedownloaded from the website into a recording medium such as a hard diskor the like. Moreover, the present invention is realized by a method todivide the program code composing the program of the present inventioninto a plurality of files and to download each of the files from adifferent website. That is, a www server, which enables a plurality ofusers to download the program file for causing the computer to realizethe functions and processing of the present invention, is sometimes aconstituent of the present invention.

Still further, the program of the present invention may be encrypted andstored in a recording medium such as a CD-ROM and the like, anddistributed to users. In this case, only a user, who has cleared acertain condition, may download key information to break the encryptionfrom a website via the Internet or an intranet, decrypt the encryptedprogram using the key information for execution, and install the programinto the computer.

Moreover, the computer may realize the functions of the foregoingembodiments by executing the read-out program. Here, according to aninstruction of the program, an OS operating on the computer or the likemay perform a part of or the whole actual processing. Obviously, thiscase also can realize the functions of the foregoing embodiments.

Moreover, the program read-out from the recording medium may be writteninto a memory which is provided to a function extension board insertedinto the computer or a function extension unit connected to thecomputer. According to an instruction of the program, a CPU or the like,which is provided to the function extension board or the functionextension unit, may perform a part of or the whole actual processing. Inthis manner, the functions of the foregoing embodiments may be realized.

Further, while Embodiment 1 allots the document data to a thread foreach unit of the blocks, the processing unit is not limited to thisblock as far as the processing unit is the unit in the area dividingprocessing of the scan image. For example, when the area dividingprocessing is not planar dimension dividing but layer dividing, thedocument data may be allotted for each of the layers.

Still further, while Embodiment 4 performs the rendering processing inthe conversion to the universal format, the rendering processing may beperformed in the background of the storing process for the box. Thereby,it is possible to reduce the conversion time for the universal format.

Moreover, while Embodiment 4 performs the rendering processing in theconversion to the universal format, it may be an option to utilizeattribute information obtained in the rendering processing (attributebits such as TEXT, GRAPHIC, IMAGE, etc. which are output usually in therendering) for the area dividing in the scan image processing.

While the present invention has been discussed with reference toexemplary embodiments, it is to be understood that the invention is notlimited to the disclosed exemplary embodiments. The scope of thefollowing claims is to be accorded the broadest interpretation so as toencompass all such modifications and equivalent structures andfunctions.

This application claims the benefit of Japanese Patent Application No.2008-030587 filed Feb. 12, 2008, which is hereby incorporated byreference herein in its entirety.

1. An image processing apparatus, comprising: an input componentconstructed to input scan image data from a scanner; a first conversioncomponent constructed to convert at least a part of an area in the scanimage data input by the input component into vector data which does notdepend on a resolution of the input component; a second conversioncomponent constructed to generate metadata by executing at least one ofa character recognition processing and an image recognition processingfor the scan image data; a first document generation componentconstructed to generate a first document which includes the vector dataconverted by the first conversion component and the metadata generatedby the second conversion component; a reception component constructed toreceive PDL data from an external apparatus; a PDL analysis componentconstructed to generate vector data and metadata by analyzing the PDLdata received by the reception component; a second document generationcomponent constructed to generate a second document which includes thevector data generated by the PDL analysis component and the metadatagenerated by the PDL analysis component; a data type set componentconstructed to set data type information into the metadata of eachdocument, the data type information indicating which of the scan imagedata and the PDL data has been used for generating each document; and aswitching component constructed to switch a data processing to beexecuted for the document based on the data type information set in themetadata of the document.
 2. The image processing apparatus according toclaim 1, wherein the data processing is a processing for rendering thevector data included in the document into a bitmap, and wherein, in acase that the data type information of the metadata is the scan imagedata, the switching component divides the vector data included in thedocument into a plurality of division units, and renders the divisionunits using a plurality of processors.
 3. The image processing apparatusaccording to claim 1, further comprising an optimization componentconstructed to execute an optimizing process of the vector data includedin a document, wherein the switching component executes the optimizingprocess for the vector data included in the document when the data typeinformation of the metadata is the scan image data, and omits theoptimizing process when the data type is the scan image data.
 4. Theimage processing apparatus according to claim 1, further comprising aformat conversion component constructed to convert a format of adocument into a universal data format, wherein, in a case that the datatype information of the metadata is the PDL data, the switchingcomponent renders the document into a bitmap, executes the conversionprocess of the first conversion component for the rendered bitmap, andexecutes a format conversion process of the format conversion componentfor the document for which the conversion process of the firstconversion component has been executed.
 5. The image processingapparatus according to claim 1, in a case that a search process isexecuted using the metadata of the document, the switching componentperforms the search process for a document that has the data typeinformation of the scan image data, and does not perform the searchprocess for a document that has the data type information of PDL data.6. An image processing method, comprising the steps of: an input step ofan input component inputting scan image data from a scanner; a firstconverting step of converting at least a part of an area of the scanimage data input by the input component into vector data which does notdepend on a resolution of the input component; a second conversion stepof generating metadata by executing at least one of a characterrecognition processing and an image recognition processing for the scanimage data; a first document generation step of generating a firstdocument which includes the vector data converted in the firstconversion step and the metadata generated in the second conversionstep; a reception step of receiving PDL data from an external apparatus;a PDL analysis step of generating vector data and metadata by analyzingthe PDL data received in the reception step; a second documentgeneration step of generating a second document which includes thevector data generated in the PDL analysis step and the metadatagenerated in the PDL analysis step; a data type set step of setting datatype information into the metadata of each document, the data typeinformation indicating which of the scan image data and the PDL data hasbeen used for generating each document; and a switching step ofswitching a data processing to be executed for the document based on thedata type information set in the metadata of the document.
 7. Anon-transitory computer-readable recording medium recording a programfor causing a computer to execute a method comprising the steps of: aninput step of an input component inputting scan image data from ascanner; a first converting step of converting at least a part of anarea of the scan image data input by and the input component into vectordata which does not depend on a resolution of the input component; asecond conversion step of generating metadata by executing at least oneof a character recognition processing and an image recognitionprocessing for the scan image data; a first document generation step ofgenerating a first document which includes the vector data converted inthe first conversion step and the metadata generated in the secondconversion step; a reception step of receiving PDL data from an externalapparatus; a PDL analysis step of generating vector data and metadata byanalyzing the PDL data received in the reception step; a second documentgeneration step of generating a second document which includes thevector data generated in the PDL analysis step and the metadatagenerated in the PDL analysis step; a data type set step of setting datatype information into the metadata of each document, the data typeinformation indicating which of the scan image data and the PDL data hasbeen used for generating each document; and a switching step ofswitching a data processing to be executed for the document based on thedata type information set in the metadata of the document.